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What is claimed is: 

1 . A method of fusing first and second datasets, comprising: 
determining a ranking of a plurality of matching variables associated with the 

first and second datasets; 

generating a hierarchical matching grid including a plurality of levels based on 
the ranking of the plurality of matching variables; 

identifying first and second sets of match candidates from the first and second 
datasets based on one of the plurality of levels of the hierarchical matching grid; and 

fusing records in the first and second sets of match candidates based on 
probabilities associated with the records. 

2. A method as defined in claim 1, wherein determining the ranking of 
the plurality of matching variables includes ranking the plurality of matching 
variables based on a relative strength of a relationship between each of the matching 
variables and a respondent characteristic. 

3. A method as defined in claim 1, wherein the first and second datasets 
include respondent records. 

4. A method as defined in claim 1, wherein generating the hierarchical 
matching grid including the plurality of levels based on the ranking of the plurality of 
matching variables includes generating a series of binary values so that each of a 
plurality of bit positions associated with the binary values uniquely corresponds to 
one of the plurality of matching variables. 

5. A method as defined in claim 4 5 wherein the series of binary values is a 
sequential series of binary values. 
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6. A method as defined in claim 4, wherein each of the plurality of bit 
positions is assigned to its corresponding one of the plurality of matching variables so 
that higher order ones of the bit positions are associated with more important ones of 
the matching variables. 

7. A method as defined in claim 1 , wherein the generating the 
hierarchical matching grid including the plurality of levels based on the ranking of the 
plurality of matching variables includes generating the hierarchical matching grid to 
allow skewed matching on one or more of the matching variables. 

8. A method as defined in claim 1, wherein generating the hierarchical 
matching grid including the plurality of levels based on the ranking of the plurality of 
matching variables includes establishing a minimum matching level. 

9. A method as defined in claim 1 , wherein identifying the first and 
second sets of match candidates from the first and second datasets based on the one of 
the plurality of levels of the hierarchical matching grid includes using match criteria 
from the one of the plurality of levels of the hierarchical matching grid to identify 
records in the second dataset that match records in the first dataset on ones of the 
plurality of matching variables defined by the match criteria. 

10. A method as defined in claim 9, wherein the match criteria includes at 
least one of a binary value and an allowed skew. 

11. A method as defined in claim 1 , wherein fusing the records in the first 
and second sets of match candidates based on the probabilities associated with the 
records includes establishing the probabilities based on weights associated with 
records from at least one of the first and second sets of match candidates. 
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12. A method as defined in claim 1, further comprising: 
comparing a first sum of weights associated with the first set of match 

candidates with a second sum of weights associated with the second set of match 
candidates; 

identifying one of the first and second sets of match candidates as overweight 
based on the comparison of the first and second sums of weights; and 

trimming records of one of the first and second sets of match candidates 
identified as overweight prior to fusing the records in the first and second sets of 
match candidates. 

13. A method as defined in claim 12, further comprising restoring an 
excess weight portion of trimmed records to corresponding ones of the first and 
second datasets. 

14. A method as defined in claim 13, further comprising identifying third 
and fourth sets of match candidates from the first and second datasets including the 
restored excess weight portion of the trimmed records based on a second level of the 
hierarchical matching grid; and 

fusing records in the third and fourth sets of match candidates based on 
probabilities associated with records within at least one of the third and fourth sets of 
match candidates. 

15. A method as defined in claim 1 , further comprising validating fused 
records based on index values generated using at least one match percentage. 



37 



PATENT 

NIELSEN MEDIA RESEARCH/22 1 -US 

1 6. A system for fusing first and second datasets, comprising: 
a memory; and 

a processor coupled to the memory and configured to: 

determine a ranking of a plurality of matching variables associated 

with the first and second datasets; 

generate a hierarchical matching grid including a plurality of levels 

based on the ranking of the plurality of matching variables; 

identify first and second sets of match candidates from the first and 

second datasets based on one of the plurality of levels of the hierarchical matching 

grid; and 

fuse records in the first and second sets of match candidates based on 
probabilities associated with the records. 

17. A system as defined in claim 16, wherein the processor is configured 
to determine the ranking of the plurality of matching variables by ranking the plurality 
of matching variables based on a relative strength of a relationship between each of 
the matching variables and a respondent characteristic. 

18. A system as defined in claim 16, wherein the first and second datasets 
include respondent records. 

19. A system as defined in claim 16, wherein the processor is configured 
to generate the hierarchical matching grid including the plurality of levels based on 
the ranking of the plurality of matching variables by generating a series of binary 
values so that each of a plurality of bit positions associated with the binary values 
uniquely corresponds to one of the plurality of matching variables. 
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20. A system as defined in claim 19, wherein the series of binary values is 
a sequential series of binary values. 

21. A system as defined in claim 19, wherein each of the plurality of bit 
positions is assigned to its corresponding one of the plurality of matching variables so 
that higher order ones of the bit positions are associated with more important ones of 
the matching variables. 

22. A system as defined in claim 16, wherein the processor is configured 
to generate the hierarchical matching grid having the plurality of levels based on the 
ranking of the plurality of matching variables by generating the hierarchical matching 
grid to allow skewed matching on one or more of the matching variables. 

23. A system as defined in claim 16, wherein the processor is configured 
to generate the hierarchical matching grid including the plurality of levels based on 
the ranking of the plurality of matching variables by establishing a minimum 
matching level. 

24. A system as defined in claim 16, wherein the processor is configured 
to identify the first and second sets of match candidates from the first and second 
datasets based on the one of the plurality of levels of the hierarchical matching grid by 
using match criteria from the one of the plurality of levels of the hierarchical 
matching grid to identify records in the second dataset that match records in the first 
dataset on ones of the plurality of matching variables defined by the match criteria. 

25. A system as defined in claim 24, wherein the match criteria includes at 
least one of a binary value and an allowed skew. 
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26. A system as defined in claim 1 6, wherein the processor is configured 
to fuse the records in the first and second sets of match candidates based on the 
probabilities associated with the records by establishing the probabilities based on 
weights associated with records from at least one of the first and second sets of match 
candidates. 

27. A system as defined in claim 16, wherein the processor is configured 

to: 

compare a first sum of weights associated with the first set of match 
candidates with a second sum of weights associated with the second set of match 
candidates; 

identify one of the first and second sets of match candidates as overweight 
based on the comparison of the first and second sums of weights; and 

trim records of the one of the first and second sets of match candidates 
identified as overweight prior to fusing the records in the first and second sets of 
match candidates. 

28. A system as defined in claim 27, wherein the processor is configured 
to restore an excess weight portion of trimmed records to corresponding ones of the 
first and second datasets. 
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29. A system as defined in claim 28, wherein the processor is configured 

to: 

identify third and fourth sets of match candidates from the first and second 
datasets including the restored excess weight portion of the trimmed records based on 
a second level of the hierarchical matching grid; and 

fuse records in the third and fourth sets of match candidates based on 
probabilities associated with records within at least one of the third and fourth sets of 
match candidates. 

30. A system as defined in claim 16, wherein the processor is configured 
to validate fused records based on index values generated using at least one match 
percentage. 

31. A machine readable medium having instructions stored thereon that, 
when executed, cause a machine to: 

determine a ranking of a plurality of matching variables associated with first 
and second datasets; 

generate a hierarchical matching grid including a plurality of levels based on 
the ranking of the plurality of matching variables; 

identify first and second sets of match candidates from the first and second 
datasets based on one of the plurality of levels of the hierarchical matching grid; and 

fuse records in the first and second sets of match candidates based on 
probabilities associated with the records. 



41 



PATENT 

NIELSEN MEDIA RESEARCH/221 -US 

32. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to determine the ranking of the 
plurality of matching variables by ranking the plurality of matching variables based 
on a relative strength of a relationship between each of the matching variables and a 
respondent characteristic. 

33. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to generate the hierarchical 
matching grid including the plurality of levels based on the ranking of the plurality of 
matching variables by generating a series of binary values so that each of a plurality 
of bit positions associated with the binary values uniquely corresponds to one of the 
plurality of matching variables. 

34. A machine readable medium as defined in claim 3 1 having instructions 
stored thereon that, when executed, cause the machine to generate the hierarchical 
matching grid including the plurality of levels based on the ranking of the plurality of 
matching variables by generating the hierarchical matching grid to allow skewed 
matching on one or more of the matching variables. 

35. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to establish a minimum 
matching level within the hierarchical matching grid. 
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36. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to identify the first and second 
sets of match candidates from the first and second datasets based on the one of the 
plurality of levels of the hierarchical matching grid by using match criteria from the 
one of the plurality of levels of the hierarchical matching grid to identify records in 
the second dataset that match records in the first dataset on ones of the plurality of 
matching variables defined by the match criteria. 

37. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to fuse the records in the first 
and second sets of match candidates based on the probabilities associated with the 
records by establishing the probabilities based on weights associated with records 
from at least one of the first and second sets of match candidates. 

38. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to: 

compare a first sum of weights associated with the first set of match 
candidates with a second sum of weights associated with the second set of match 
candidates; 

identify one of the first and second sets of match candidates as overweight 
based on the comparison of the first and second sums of weights; and 

trim records of the one of the first and second sets of match candidates 
identified as overweight prior to fusing the records in the first and second sets of 
match candidates. 
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39. A machine readable medium as defined in claim 38 having instructions 
stored thereon that, when executed, cause the machine to restore an excess weight 
portion of trimmed records to corresponding ones of the first and second datasets. 

40. A machine readable medium as defined in claim 39 having 
instructions stored thereon that, when executed, cause the machine to: 

identify third and fourth sets of match candidates from the first and second 
datasets including the restored excess weight portion of the trimmed records based on 
a second level of the hierarchical matching grid; and 

fuse records in the third and fourth sets of match candidates based on 
probabilities associated with records within at least one of the third and fourth sets of 
match candidates. 

41. A machine readable medium as defined in claim 31 having instructions 
stored thereon that, when executed, cause the machine to validate the fused records 
based on index values generated using at least one match percentage. 

42. A system for fusing datasets, comprising: 

a match grid generator configured to generate a hierarchical matching grid 
having a plurality of levels based on a ranking of a plurality of matching variables 
associated with first and second datasets; 

a match candidate identifier configured to identify first and second sets of 
match candidates from the first and second datasets based on at least one of the 
plurality of levels of the hierarchical matching grid; and 

a matcher configured to fuse records in the first and second sets of match 
candidates based on probabilities associated with the records. 
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43. A system as defined in claim 42, further comprising a ranker 
configured to determine the ranking of the plurality of matching variables based on a 
relative strength of a relationship between each of the matching variables and a 
respondent characteristic. 

44. A system as defined in claim 42, wherein each of the first and second 
datasets includes respondent records. 

45. A system as defined in claim 42, wherein the match grid generator is 
configured to generate the hierarchical matching grid by generating a series of binary 
values so that each of a plurality of bit positions associated with the binary values 
uniquely corresponds to one of the plurality of matching variables. 

46. A system as defined in claim 42, wherein the match grid generator is 
configured to generate the hierarchical matching grid by generating the hierarchical 
matching grid to allow skewed matching on one or more of the matching variables. 

47. A system as defined in claim 42, wherein the match grid generator is 
configured to generate the hierarchical matching grid to establish a minimum 
matching level. 

48. A system as defined in claim 42, wherein the match candidate 
identifier is configured to identify the first and second sets for match candidates using 
match criteria from the at least one of the plurality of levels of the hierarchical 
matching grid to identify records in the second dataset that match records in the first 
dataset on ones of the plurality of matching variables defined by the match criteria. 

49. A system as defined in claim 48, wherein the match criteria includes at 
least one of a binary value and an allowed skew. 
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50. A system as defined in claim 42, wherein the fuser is configured to 
fuse the records in the first and second sets of match candidates by establishing the 
probabilities based on weights associated with records from at least one of the first 
and second sets of match candidates. 

51. A system as defined in claim 42, further comprising: 

a weight checker configured to compare a first sum of weights associated with 
the first set of match candidates with a second sum of weights associated with the 
second set of match candidates. 

52. A system as defined in claim 51, further comprising a trimmer 
configured to identify one of the first and second sets of match candidates as 
overweight based on the comparison of the first and second sums of weights and trim 
records of the one of the first and second sets of match candidates identified as 
overweight prior to fusing the first and second sets of match candidates. 

53. A system as defined in claim 52, further comprising a restorer that 
restores an excess weight portion of trimmed records to corresponding ones of the 
first and second datasets. 

54. A system as defined in claim 42, further comprising a system 
configured to validate fusion of records performed by the matcher. 
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55. A system as defined in claim 54, wherein the system configured to 
validate the fusion of records performed by the matcher comprises: 

a segmenter configured to segment at least one of the first and second datasets; 
a splitter configured to split the segmented at least one of the first and second 
datasets into equally weighted third and fourth datasets; 

a fuser configured to fuse the third and fourth datasets to form a fifth dataset; 

and 

an index generator configured to use at least a portion of the fifth dataset to 
generate at least one index value indicative of a performance characteristic of the 
matcher for use in validating the fusion of records performed by the matcher. 

56. A method of fusing datasets, comprising: 

ranking a plurality of matching variables associated with first and second 
datasets; and 

fusing records from the first and second datasets based on the ranking of the 
plurality of matching variables and probabilities associated with the records from the 
first and second datasets. 

57. A method as defined in claim 56, further comprising selecting the 
records from the first and second datasets based on matching criteria. 

58. A method as defined in claim 57, further comprising selecting the 
matching criteria from a hierarchical match grid including the plurality matching 
variables and a plurality of levels. 

59. A method as defined in claim 58, further comprising generating the 
hierarchical match grid so that at least one of the plurality of matching variables on at 
least one of the levels is associated with a skewed matching criteria. 
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60. A method as defined in claim 56, further comprising: 

trimming an overweight proportion from each of the records of one of the first 
and second datasets; and 

using the overweight proportion from each of the records of the one of the first 
and second datasets to perform a fusion after fusing the records from the first and 
second datasets. 

61. A method of fusing datasets, comprising: 

selecting first match criteria that ranks matching variables associated with first 
and second datasets; and 

fusing records from the first and second datasets based on the first match 
criteria and weights associated with the records from the first and second datasets. 

62. A method as defined in claim 61, further comprising: 

selecting second match criteria associated with third and fourth datasets; and 
fusing records from the third and fourth datasets based on the second match 

criteria and weights associated with two or more of the first, second, third and fourth 

datasets. 

63. A method of validating a data fusion process, comprising: 
splitting a first dataset into second and third datasets; 

fusing the second and third datasets to form a fourth dataset; 
calculating at least one match rate associated with the fourth dataset; and 
generating an index indicative of a performance of the data fusion process 
based on the at least one match rate. 

64. A method as defined in claim 63, further comprising segmenting the 
first dataset into a plurality of usage categories prior to splitting the first dataset. 
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65. A method as defined in claim 63, further comprising: 

randomly splitting the first dataset a plurality of times to generate a plurality 
of separate pairs of datasets; 

fusing each of the separate pairs of datasets to form a plurality of fused 
datasets; and 

combining the plurality of fused datasets to form the fourth dataset. 

66. A system for validating a fusion process, comprising: 

a splitter configured to split a first dataset into second and third datasets; 
a fuser configured to fuse the second and third datasets to form a fourth 

dataset; 

a match rate calculator configured to calculate at least one match rate 
associated with the fourth dataset; and 

an index generator configured to generate an index indicative of a 
performance of the data fusion process based on the at least one match rate. 

67. A system as defined in claim 66, further comprising a segmenter 
configured to segment the first dataset into a plurality of usage categories. 
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